{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font color=black size=5 face=雅黑>**GML(Gradual Machine Learning)框架接口文档说明**</font>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "用户在使用GML框架进行推理时，需要提供以下三个部分的数据：             \n",
    "1. 变量          \n",
    "2. 特征               \n",
    "3. 简单实例       \n",
    "     "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font color=black size=4 face=雅黑>**其主要数据结构定义如下:**</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#1.变量(variables)数据结构定义\n",
    "variables=list() #元素类型为dict:variable\n",
    "variable =\n",
    "{\n",
    "    #由用户提供的属性\n",
    "    'var_id':1,                  #变量ID   int类型，从0开始\n",
    "    'is_easy': False,            #是否是Easy\n",
    "    'is_evidence':True,          #是否是证据变量\n",
    "    'true_label':1,              #真实标签，用于后续计算精度等\n",
    "    'label': 1，                 #推理出的标签:0为negative,1为为positive, -1为不知道\n",
    "    'feature_set':               #此变量拥有的所有feature\n",
    "    { feature_id1: [theta1,feature_value1],\n",
    "      feature_id2: [theta2,feature_value2],\n",
    "      ...\n",
    "    },\n",
    "    #代码运行期间可能会自动生成的属性\n",
    "    'probability':0.99，          #推理出的概率\n",
    "    'evidential_support':0.3,     #evidential_support\n",
    "    'entropy': 0.4,               #熵\n",
    "    'approximate_weight':0.3,     #近似权重\n",
    "     ...\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#2.特征(feature)数据结构定义\n",
    "features  = list()      #元素类型为dict:feature\n",
    "feature =\n",
    "{\n",
    "    #需要用户提供的属性\n",
    "    'feature_id':1,                                       #特征id, int类型，从0开始\n",
    "    'feature_type': unary_feature/binary_feature,         #区分此特征是单因子特征还是双因子特征，目前支持unary_feature和binary_feature两种\n",
    "    'feature_name': good,                                 #特征名，如果是token类型，就是token具体的词，如果是其他类型的特征，就是特征的具体类型\n",
    "    'weight':                                             #拥有此特征的所有变量的集合\n",
    "    {\n",
    "      var_id1:        [weight_value1,feature_value1],     #unary_feature\n",
    "     (varid3,varid4): [weight_value2,feature_value2],     #binary_feature\n",
    "      ...\n",
    "    }\n",
    "    #代码运行期间可能会自动生成的属性\n",
    "    'tau':0,\n",
    "    'alpha':0,\n",
    "    'regerssion'： object,\n",
    "    ...\n",
    "}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#3. 简单实例(Easy)数据结构定义 用户也可以直接在variables标明Easy而不提供此文件\n",
    "easys = list() #元素类型为dict:easy\n",
    "easy  = \n",
    "{\n",
    "    'var_id': 0,\n",
    "    'label':  1\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<font color=black size=4 face=雅黑>**在准备好所需数据之后，主要的调用流程如下：**</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gml import GML\n",
    "from easy_instance_labeling import  EasyInstanceLabeling\n",
    "from evidential_support import EvidentialSupport\n",
    "from evidence_select import EvidenceSelect\n",
    "from approximate_probability_estimation import ApproximateProbabilityEstimation\n",
    "\n",
    "EasyInstanceLabeling(variables,features,easys).label_easy_by_file()\n",
    "graph = GML(variables, features, evidential_support_method, approximate_probability_method,evidence_select_method, top_m=2000, top_k=10, update_proportion=0.01,\n",
    "                 balance = False)\n",
    "graph.inference()            #因子图推理\n",
    "graph.score()                #获取推理结果"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font color=black size=4 face=雅黑>**目前gml模块中主要的类和函数的详细功能设计如下：**</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#gml类，gml主要流程\n",
    "Class GML(     \n",
    " variables,                         #变量\n",
    " features,                           #特征\n",
    " evidential_support_method,                  \n",
    " #计算evidential support的方法，目前支持两种：regression 和 relation\n",
    " approximate_probability_method,\n",
    "#计算近似概率的方法，目前支持两种：interval 和 relation\n",
    " evidence_select_method,\n",
    "#为待推隐变量挑选证据的方法，目前支持两种：interval 和 relation\n",
    " top_m=2000,                                    #topm\n",
    " top_k=10,                                          #topk\n",
    " update_proportion=0.01,                 #更新比例\n",
    " balance = False                                 #是否进行平衡化\n",
    " )\n",
    "#GML类主要函数如下：\n",
    "1.evidential_support()                                           #计算evidential support\n",
    "2.approximate_probability_estimation()              #计算近似概率\n",
    "3.select_top_m_by_es()                                         #选topm\n",
    "4.select_top_k_by_entropy()                                 #选topk\n",
    "5.select_evidence()                                               #挑选证据\n",
    "6.construct_subgraph()                                        #构建子图\n",
    "7.inference_subgraph()                                        #推理子图\n",
    "8.label()                                                                #从k个变量中选一个进行标记\n",
    "9.inference()                                                         #主流程，贯穿以上"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#简单实例标注类\n",
    "Class EasyInstanceLabeling(variables, features, easys)\n",
    "#此类目前主要提供以下方法：\n",
    "1.label_easy_by_file()                          #根据给定的easy文件标easy\n",
    "2.label_easy_by_clustering()                     #实体识别中标Easy的方法\n",
    "3.label_easy_by_custom()                         #可由用户自行实现\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#近似概率计算类\n",
    "Class ApproximateProbabilityEstimation(variables)   \n",
    "#此类目前主要提供以下方法：\n",
    "1.approximate_probability_estimation_by_interval()     #实体识别所用方法\n",
    "2.approximate_probability_estimation_by_relation()    #情感分析所有方法\n",
    "3.approximate_probability_estimation_by_custom()      #可由用户自行实现\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "4.Class EvidentialSupport(\t\t\t# 计算Evidential Support的类\n",
    "variables,              \t\t\t\t\t# 变量\n",
    "features,               \t\t\t\t\t# 特征\n",
    "method,               \t\t\t\t\t# 计算Evidential Support的方法\n",
    "evidence_interval_count=10,              \t# 区间数\n",
    "interval_evidence_count=200            \t# 每个区间的变量数\n",
    ")\n",
    "此类目前主要提供以下方法：\n",
    "1.evidential_support_by_regression()            \n",
    "# 计算Evidential Support的方法之一，适用于实体识别\n",
    "2.evidential_support_by_relation()         \n",
    "# 计算Evidential Support的方法之一，适用于情感分析\n",
    "3.evidential_support_by_custom()        \t \n",
    "# 用户自定义计算Evidential Support的方法\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "5.Class EvidenceSelect(\t\t\t\t# 挑选证据变量的类\n",
    "variables,\t\t\t\t\t\t\t# 变量\n",
    "features,\t\t\t\t\t\t\t# 特征\n",
    "interval_evidence_count,\t\t\t\t# 每个区间挑选的证据个数\n",
    "subgraph_limit_num,\t\t\t\t# 子图变量上限\n",
    "k_hop\t\t\t\t\t\t\t# 相邻变量的跳数（用于情感分析）\n",
    ")\n",
    "此类目前主要提供以下方法：\n",
    "1.select_evidence_by_interval()\t\t\t#用于实体识别\n",
    "2.select_evidence_by_realtion()\t\t\t#用于情感分析\n",
    "3.select_evidence_by_custom()\t\t\t#用户自定义\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "6.Class ConstructSubgraph(\n",
    "variables, \n",
    "features,\n",
    "balance                                                         #是否在建图的时候平衡话\n",
    ")\n",
    "此类目前主要提供以下方法：\n",
    "1.construct_subgraph_for_unaryPara()    #只有单因子（需要参数化），适用于ER\n",
    "2.construct_subgraph_for_mixture()       #有单双因子(均不需要参数化),适用于ALSA\n",
    "3.construct_subgraph_for_custom()        #用户自定义\n",
    "\n",
    "\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "source": [],
    "metadata": {
     "collapsed": false
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}